Efforts to improve the adversarial robustness of convolutional neural networks have primarily focused on developing more effective adversarial training methods. In contrast, little attention was devoted to analyzing the role of architectural elements (such as topology, depth, and width) on adversarial robustness. This paper seeks to bridge this gap and present a holistic study on the impact of architectural design on adversarial robustness. We focus on residual networks and consider architecture design at the block level, i.e., topology, kernel size, activation, and normalization, as well as at the network scaling level, i.e., depth and width of each block in the network. In both cases, we first derive insights through systematic ablative experiments. Then we design a robust residual block, dubbed RobustResBlock, and a compound scaling rule, dubbed RobustScaling, to distribute depth and width at the desired FLOP count. Finally, we combine RobustResBlock and RobustScaling and present a portfolio of adversarially robust residual networks, RobustResNets, spanning a broad spectrum of model capacities. Experimental validation across multiple datasets and adversarial attacks demonstrate that RobustResNets consistently outperform both the standard WRNs and other existing robust architectures, achieving state-of-the-art AutoAttack robust accuracy of 61.1% without additional data and 63.7% with 500K external data while being $2\times$ more compact in terms of parameters. Code is available at \url{ https://github.com/zhichao-lu/robust-residual-network}
translated by 谷歌翻译
Structural failures are often caused by catastrophic events such as earthquakes and winds. As a result, it is crucial to predict dynamic stress distributions during highly disruptive events in real time. Currently available high-fidelity methods, such as Finite Element Models (FEMs), suffer from their inherent high complexity. Therefore, to reduce computational cost while maintaining accuracy, a Physics Informed Neural Network (PINN), PINN-Stress model, is proposed to predict the entire sequence of stress distribution based on Finite Element simulations using a partial differential equation (PDE) solver. Using automatic differentiation, we embed a PDE into a deep neural network's loss function to incorporate information from measurements and PDEs. The PINN-Stress model can predict the sequence of stress distribution in almost real-time and can generalize better than the model without PINN.
translated by 谷歌翻译
对复杂建筑环境的结构监测通常在设计,实验室测试和实际建筑参数之间遭受不匹配。此外,现实世界中的结构识别问题遇到了许多挑战。例如,缺乏准确的基线模型,高维度和复杂的多元部分微分方程(PDE)在训练和学习常规数据驱动算法方面遇到了重大困难。本文通过增强使用神经网络来控制结构动力学的PDE来探讨一个称为Neuralsi的新框架,以供结构识别。我们的方法试图从管理方程式估算非线性参数。我们考虑具有两个未知参数的非线性光束的振动,一个参数代表几何和材料变化,另一种代表主要通过阻尼捕获系统中的能量损失。参数估计的数据是从有限的一组测量值中获得的,这有利于在结构健康监测中的应用,其中通常未知现有结构的确切状态,并且只能在现场收集有限的数据样本。也可以使用已识别的结构参数在标准和极端条件下训练有素的模型。我们与纯数据驱动的神经网络和其他经典物理信息的神经网络(PINN)进行了比较。我们的方法将位移分布中的插值和外推误差降低了基线上的两到五个数量级。代码可从https://github.com/human-analysis/naural-scruptural-isendification获得。
translated by 谷歌翻译
本文提出了一种非相互作用的端到端解决方案,用于使用完全同构加密(FHE)的生物识别模板的安全融合和匹配。给定一对加密的特征向量,我们执行以下密码操作,i)特征串联,ii)通过学习的线性投影降低融合和尺寸,iii)缩放到单位$ \ ell_2 $ -norm和iv)匹配分数计算。我们的方法被称为heft(生物识别模板的同派加密融合),是定制设计的,以克服FHE所施加的独特约束,即缺乏对非偏心操作的支持。从推论的角度来看,我们系统地探索了不同的数据包装方案,以进行计算有效的线性投影,并引入多项式近似来进行比例归一化。从训练的角度来看,我们引入了一种了解线性投影矩阵的FHE感知算法,以减轻近似归一化引起的错误。与各自的UNIBIOMETICTAINS相比,对面部和语音生物识别技术的模板融合和匹配的实验评估表明,(I)将生物识别验证性能提高了11.07%和9.58%的AUROC,同时将特征向量压缩为16(512d至32d), ,(ii)融合了一对加密的特征向量,并计算出在884毫秒内的1024个画廊的匹配分数。代码和数据可在https://github.com/human-analysis/crypted-biometric-fusion上获得
translated by 谷歌翻译
闭塞是无关的面部图像中的常见发生。由于闭塞的存在,这种面部图像的单个图像3D重建经常受到腐败。此外,虽然在遮挡区域中具有多个3D重建,但是现有方法仅限于仅产生单个解决方案。为了解决这两种挑战,我们呈现了不同的3DFace,专门设计用于从单个遮挡的面部图像同时产生一系列多样化的3D重建集。它由三个组成部分组成:全局+局部形状拟合过程,基于图形神经网络的网格VAE,以及促进迭代优化过程的决定性点过程的多样性。在闭塞面上的3D重建的定量和定性比较表明,多样化3dface可以估计与目标图像中的可见区域一致的3D形状,同时在遮挡区域上表现出高而逼真的分集。在面部图像上由掩模,眼镜和其他随机物体封闭,不同的3dface在与基线相比,在遮挡区域上产生3D形状的3D形状的分布。此外,我们最接近地面真理的样品比现有方法的单数重建降低了40%。
translated by 谷歌翻译
我们提出了一种方法,可以针对加密域中的大型画廊搜索探针(或查询)图像表示。我们要求探针和画廊图像以固定长度表示形式表示,这对于从学习的网络获得的表示是典型的。我们的加密方案对如何获得固定长度表示不可知,因此可以应用于任何应用域中的任何固定长度表示。我们的方法被称为HERS(同派加密表示搜索),是通过(i)压缩表示其估计的固有维度的表示,而准确性的最小损失(ii)使用拟议的完全同质加密方案和(iii)有效地加密压缩表示形式(ii)直接在加密域中直接搜索加密表示的画廊,而不会解密它们。大型面部,指纹和对象数据集(例如ImageNet)上的数值结果表明,在加密域中,首次准确且快速的图像搜索是可行的(500秒; $ 275 \ times $ 275 \ times $ speed胜过状态 - 与1亿个画廊的加密搜索有关)。代码可从https://github.com/human-analysis/hers-ecrypted-image-search获得。
translated by 谷歌翻译
Many applications of representation learning, such as privacy preservation, algorithmic fairness, and domain adaptation, desire explicit control over semantic information being discarded. This goal is formulated as satisfying two objectives: maximizing utility for predicting a target attribute while simultaneously being invariant (independent) to a known semantic attribute. Solutions to invariant representation learning (IRepL) problems lead to a trade-off between utility and invariance when they are competing. While existing works study bounds on this trade-off, two questions remain outstanding: 1) What is the exact trade-off between utility and invariance? and 2) What are the encoders (mapping the data to a representation) that achieve the trade-off, and how can we estimate it from training data? This paper addresses these questions for IRepLs in reproducing kernel Hilbert spaces (RKHS)s. Under the assumption that the distribution of a low-dimensional projection of high-dimensional data is approximately normal, we derive a closed-form solution for the global optima of the underlying optimization problem for encoders in RKHSs. This yields closed formulae for a near-optimal trade-off, corresponding optimal representation dimensionality, and the corresponding encoder(s). We also numerically quantify the trade-off on representative problems and compare them to those achieved by baseline IRepL algorithms.
translated by 谷歌翻译
The adversarial input generation problem has become central in establishing the robustness and trustworthiness of deep neural nets, especially when they are used in safety-critical application domains such as autonomous vehicles and precision medicine. This is also practically challenging for multiple reasons-scalability is a common issue owing to large-sized networks, and the generated adversarial inputs often lack important qualities such as naturalness and output-impartiality. We relate this problem to the task of patching neural nets, i.e. applying small changes in some of the network$'$s weights so that the modified net satisfies a given property. Intuitively, a patch can be used to produce an adversarial input because the effect of changing the weights can also be brought about by changing the inputs instead. This work presents a novel technique to patch neural networks and an innovative approach of using it to produce perturbations of inputs which are adversarial for the original net. We note that the proposed solution is significantly more effective than the prior state-of-the-art techniques.
translated by 谷歌翻译
Household environments are visually diverse. Embodied agents performing Vision-and-Language Navigation (VLN) in the wild must be able to handle this diversity, while also following arbitrary language instructions. Recently, Vision-Language models like CLIP have shown great performance on the task of zero-shot object recognition. In this work, we ask if these models are also capable of zero-shot language grounding. In particular, we utilize CLIP to tackle the novel problem of zero-shot VLN using natural language referring expressions that describe target objects, in contrast to past work that used simple language templates describing object classes. We examine CLIP's capability in making sequential navigational decisions without any dataset-specific finetuning, and study how it influences the path that an agent takes. Our results on the coarse-grained instruction following task of REVERIE demonstrate the navigational capability of CLIP, surpassing the supervised baseline in terms of both success rate (SR) and success weighted by path length (SPL). More importantly, we quantitatively show that our CLIP-based zero-shot approach generalizes better to show consistent performance across environments when compared to SOTA, fully supervised learning approaches when evaluated via Relative Change in Success (RCS).
translated by 谷歌翻译
The primary obstacle to developing technologies for low-resource languages is the lack of representative, usable data. In this paper, we report the deployment of technology-driven data collection methods for creating a corpus of more than 60,000 translations from Hindi to Gondi, a low-resource vulnerable language spoken by around 2.3 million tribal people in south and central India. During this process, we help expand information access in Gondi across 2 different dimensions (a) The creation of linguistic resources that can be used by the community, such as a dictionary, children's stories, Gondi translations from multiple sources and an Interactive Voice Response (IVR) based mass awareness platform; (b) Enabling its use in the digital domain by developing a Hindi-Gondi machine translation model, which is compressed by nearly 4 times to enable it's edge deployment on low-resource edge devices and in areas of little to no internet connectivity. We also present preliminary evaluations of utilizing the developed machine translation model to provide assistance to volunteers who are involved in collecting more data for the target language. Through these interventions, we not only created a refined and evaluated corpus of 26,240 Hindi-Gondi translations that was used for building the translation model but also engaged nearly 850 community members who can help take Gondi onto the internet.
translated by 谷歌翻译